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Abstract 

Background: Plants have played a special role in inositol polyphosphate (IP) research since in plant seeds was 
discovered the first IP, the fully phosphorylated inositol ring of phytic acid (IP6). It is now known that phytic acid is 
further metabolized by the IP6 Kinases (IP6Ks) to generate IP containing pyro-phosphate moiety. The IP6K are 
evolutionary conserved enzymes identified in several mammalian, fungi and amoebae species. Although IP6K has 
not yet been identified in plant chromosomes, there are many clues suggesting its presences in vegetal cells. 

Results: In this paper we propose a new approach to search for the plant IP6K gene, that lead to the identification 
in plant genome of a nucleotide sequence corresponding to a specific tag of the IP6K family. Such a tag has been 
found in all IP6K genes identified up to now, as well as in all genes belonging to the Inositol Polyphosphate 
Kinases superfamily (JPK). The tag sequence corresponds to the inositol-binding site of the enzyme, and it can be 
considered as characterizing all IPK genes. To this aim we applied a technique based on motif discovery. We 
exploited DLSME, a software recently proposed, which allows for the motif structure to be only partially specified 
by the user. First we applied the new method on mitochondrial DNA (mtDNA) of plants, where such a gene could 
have been nested, possibly encrypted and hidden by virtue of the editing and/or trans-splicing processes. Then we 
looked for the gene in nuclear genome of two model plants, Arabidopsis thaliana and Oryza sativa. 

Conclusions: The analysis we conducted in plant mitochondria provided the negative, though we argue relevant, 
result that IP6K does not actually occur in vegetable mtDNA. Very interestingly, the tag search in nuclear genomes 
lead us to identify a promising sequence in chromosome 5 of Oryza sativa. Further analyses are in course to 
confirm that this sequence actually corresponds to IP6K mammalian gene. 



Background 

Plants have played a special role in inositol polypho- 
sphates research ever since the first inositol polypho- 
sphate (inositol hexakisphosphate) was described about 
90 years ago in plant seeds [1]. Interest in inositol poly- 
phosphates dramatically increased about thirty years ago 
when the role of inositol 1,4,5-trisphosphate (Ins(l,4,5) 
P3) in mobilization of Ca 2+ from intracellular stores was 
discovered [2]. It is now clear that inositol polypho- 
sphates are an important class of signaling molecules 
controlling disparate cellular functions. Inositol hexaki- 
sphosphate (IP6, also known as phytic acid) is the most 
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abundant inositol polyphosphate in eukaryotic cells. It is 
a major component of plant seeds representing 0,1 - 1% 
of its dry weight and 60 - 80% of total phosphate con- 
tent [3]. 

Significantly, IP6 is the precursor of a novel class of 
more anionic inositol polyphosphates, the inositol pyro- 
phosphates, in which the fully phosphorylated IP6 ring 
is further phosphorylated to create high-energy pyro- 
phosphate groups. The best charateryzed inositol pyro- 
phosphates are the diphosphoinositol pentakisphosphate 
(IP7 or PP-IP5) and the bis-diphosphoinositol tetraki- 
sphosphate (IP8 or [PP]2-IP4), with one and two pyro- 
phosphate group, respectively [4]. 

Inositol pyrophosphates are important cellular mes- 
sengers that control a wide range of cellular function, 
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including endocytosis [5], apoptosis [6], telomere length 
[7], DNA recombination [8]. The high energy pyropho- 
sphate bond of IP7 can directly donate the beta phos- 
phate to proteins defining a new kind of protein 
phosphorylation mechanism [9], recently proposed to 
represent a novel post transductional protein modifica- 
tion (serine pyro-phosphorylation) [10]. 

Since their discovery in the early 1990s, inositol pyro- 
phosphates have been found in all eukaryotic cells ana- 
lyzed, from yeast to mammalian neuron, along with the 
widespread conservation of the enzymes responsible for 
their synthesis. 

The mammalian enzymes responsible for IP7 synthesis 
are called IP6 Kinases (IP6Ks); they are able to convert 
IP6 plus ATP to IP7 and ADP [11]. It is now known 
that IP6Ks belong to a superfamily of Inositol Polypho- 
sphate Kinases (PFAM accession number PF03770), that 
evolved from a common ancestor, comprising IP6Ks, 
Inositol Polyphosphate Multikinase (IPMK) possessing a 
broad range of substrates and IP3-3Ks that specifically 
convert I(1,4,5)P3 to I(1,3,4,5)P4. Although IP6K has not 
yet been identified in plant genomes, the presence of 
pyrophosphate IP7 has been demonstrated also in vege- 
tal organisms, both in monocotyledonous and in dicoty- 
ledonous plants [12,13]. Furthermore, the conversion of 
IP6 to IP7 has been detected in Arabidopsis cells and 
leaf tissue in the presence of ATP, demonstrating IP6- 
kinase activity in plant extracts [14]. These findings, 
together with the observed high conservation through 
the evolution of IP6K, strongly suggest the presence of 
this enzyme in vegetal cells. Therefore, IP6K enzyme 
was searched in plant genomes by homology based 
methods, but all studies have failed to reveal its pre- 
sence. Two IPMK proteins (called AtIPK2a and AtIPK2b 
in Arabidopsis thaliana) have been identified so far 
[15,16]. These two enzymes contribute to inositol 
l,3,4,5,6-pentakisphosphate(IP5) production in Arabi- 
dopsis, but do not show any inositol pyrophosphate 
enzymatic activity [15,16]. In rice and barley an IPMK 
able to phosphorylate all intermediates from inositol 
bisphosphate to IP6 has been characterized [17]. 
To date IPMKs have been identified in dicotyledonous 
and in monocotyledonous plants, as well as in algae. 
There are many clues connecting IP6K to cell mito- 
chondria. It was shown that human IP6K2 moves from 
nuclei to mitochondria and provides physiologic regula- 
tion of apoptotic process by generating IP7 [18]. 
Furthermore, yeast deficient in KCS1 (yeast IP6-Kinase), 
kcslA, do not survive if they are grown in conditions in 
which survival is dependent from mitochondrial func- 
tion, thus demonstrating the importance of IP6K for 
this organelles [19]. Summarizing, to date IP6K has not 
been identified in plant chromosomes, but there are 
many clues suggesting its presence in vegetal cells. 



Some further observations could suggest that the corre- 
sponding gene might be found in plant mtDNA, prob- 
ably encripted and hidden by virtue of editing and/or 
trans splicing processes. It is known that most of 
mtDNA information concerns genie products acting 
inside the mitochondrion itself. Plant mitochondrial 
genomes have several peculiar characteristics such as 
the large size (from 200Kb to 2400Kb), the presence of 
introns and genetic material of chloroplast or nuclear 
origin [20]. 

Furthermore, mitochondrial genome is characterized 
by occurrence of RNA editing and trans splicing 
mechanism enlarging protein variability [21]. RNA edit- 
ing is a process in which some bases of an RNA mole- 
cule are enzymatically modified, so that its information 
content can be altered. Many molecular editing mechan- 
isms are known, but in plants the most frequent is cyti- 
dine to uridine transformation. In plant mitochondria 
RNA editing is very common and it is required for gene 
expression. Actually the genomic information encoding 
an open reading frame is often incomplete in these 
organelles, and RNA editing is necessary to yield a func- 
tional product. The amino acid sequence of the encoded 
protein is effectively altered after editing process, so that 
it differs from that predicted by the genomic DNA 
sequence. Trans splicing is a further process generating 
genetic variability, in which two RNA molecules, pro- 
duced by different DNA regions (even very distant from 
one another), are joined in a single RNA molecule able 
to produce a protein. 

On the basis of the above considerations, we decided 
to search IP6K gene in mtDNA of plants as well as in 
nuclear chromosomes of two model vegetal organisms. 
Thus, we analyzed all published mtDNA of plant and 
the whole nuclear genome of Arabidopsis thaliana and 
Oryza sativa, a dicotyledonous and a monocotyledonous 
plant respectively. Arabidopsis thaliana is a small flow- 
ering plant, belonging to eudicot, the largest group of 
flowering plants on the planet. Because of its short gen- 
eration time and compact size, it is used as a model 
organism in plant biology and genetics. Its nuclear gen- 
ome comprises five chromosomes, with a total size of 
approximately 125 Mb (megabases). It is one of the 
smallest genome among plants, and it was the first plant 
genome to be sequenced in 2000 [22]. Oryza sativa 
(rice) was the second plant genome to be published 
[23], the first among monocot. It has the smallest cereal 
genome consisting of just 430 Mb across 12 chromo- 
somes and it is routinely used as a model organism in 
cereal genomics. 

Because of the considerable sequence heterogeneity 
among the several known IPKs, common homology 
search programs are not useful to our aim. Thus, we 
decided to use a new approach, looking not for the gene 
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sequence as whole, but for a specific tag sequence, char- 
acterizing IPK gene family. This is possible only when a 
gene, or a gene family, contains a region (usually a short 
sequence) that is indispensable and always present in 
the gene sequence. In fact, alignment studies between 
IPKs from different organisms allowed to identify several 
conserved motifs in the amino acids sequence. These 
motifs comprise the ATP binding site, first characterized 
in IP3-3K [24], the C-terminal motif (last 19 amino 
acids), important for the catalytic activity [25], the 
"SSLL" motif, required for enzymatic activity of IP6K 
[26] and the P-XXX-D-X-K-X-G tag, a sequence of nine 
amino acids with four of them very conserved among 
IPKs [11]. Despite the considerable sequence heteroge- 
neity of IPKs, this last motif represents a unique consen- 
sus sequence and it can be considered a specific tag of 
IPK gene family. The consensus sequence P-XXX-D-X- 
K-X-G is a very important functional region, identifying 
the inositol binding site of the enzyme [27]. Here, the 
functional role explains its strong conservation through 
evolution. To search for the IPK family tag in plant 
DNA we exploited DLSME [28], a software for motif 
extraction which allows for the motif structure to be 
only partially specified by the user, as better explained 
in the following. 

Methods 

Common softwares for sequence search are based on 
sequence similarity, but they are not very useful when 
the expected homology between the gene searched for 
and the known sequences is low. Furthermore, these 
softwares cannot detect possible changes in nucleotide 
sequences due to RNA editing mechanisms. The intui- 
tion behind our work is that some specific gene families, 
such as all IPK genes, are characterized by the presence 
of specific tags, short sequences of few amino acids, 
often corresponding to functional regions. Thus, we pre- 
sent a general, semi-automatic methodology to discover 
the possible presence of specific, still undiscovered, 
genes in cells and we applied it to plant genomes. 
Such a methodology can be summarized as follows: 

1. {Tag Definition) set a (partially undefined) sequence 
representing the specific tag to search for; 

2. {Genome Scanning) scan a plant genome sequence 
(or a set of genome sequences) to individuate possible 
instances of the tag; 

3. {Post-processing Analysis) analyze the candidate sub- 
sequences extracted by the previous step in order to 
verify the presence of the gene in the considered 
genomes. 

In particular, the sequence associated to the tag 
defined by step 1 is made of both symbols in the alpha- 
bet 2 = {A,C,G,T}, representing nucleic acids, and a gen- 
eric symbol X that can be associated to a subset of 2. 



This way, step 2 can be carried out by performing an 
approximate search of the motif represented by the tag 
sequence. 

In the following we specify how the steps listed above 
have been particularized to achieve our purposes. 

Tag Definition 

The most important tag for IPK gene is the P-XXX-D- 
X-K-X-G motif, corresponding to the inositol binding 
site of the enzyme. Thus, for the identification of IP6K 
gene in plant DNA, we focused on the nucleotide 
sequence corresponding to this specific IPK tag. 

Genome Scanning 

We analyzed all the published mtDNAsequences (avail- 
able at http://www.ncbi.nlm.nih.gov/sites/entrez) and the 
whole nuclear genome of two plants and performed 
motif extraction from them, since a tag can be viewed 
as a subsequence whose structure is not completely spe- 
cified a priori. Among the different algorithms and tools 
available for motif discovery (e.g., see [29-34]), we chose 
DLSME [28] since it is able to handle different complex 
kinds of pattern variabilities, as will be better recalled in 
the following. 

Post-processing Analysis 

For each identified tag, we extracted a sequence of 
about 1200 nucleotides surrounding the consensus 
sequence and examined it as a candidate IP6K gene. 
Nucleotidic sequences were translated in aminoacid 
sequences by using the Transeq [35] software. Then, we 
examined the identified amino acid sequences looking 
for other IP6Ks conserved domains. In order to detect 
possible homologies, we performed sequence alignments 
using ClustalW [36] and BLAST [37]. Finally, using the 
TBLASTX and TBALSTN algorithms, we screened 
expressed sequence tag (EST) databases for proteins 
containing the sequences identified by our tag search. 

In the following, we first provide a brief description of 
DLSME and of the setting we exploited for our pur- 
poses, and then describe the main results we have 
obtained by our analysis. 

Using DLSME 

DLSME [28] is a system designed to mine general kinds 
of motifs where several "exceptions" may be tolerated; 
that is, it is able to handle different complex kinds of 
pattern variabilities. In particular, DLSME is able to 
search for patterns composed of any number of short 
subsequences (boxes, in the following), where the sizes 
of both the conserved regions and the regions between 
two boxes can be specified by the user as intervals ran- 
ging from a minimum to a maximum value. Moreover, 
mismatches are taken into account, as well as "skips" 
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(deletions) and box "swaps" (box invertions), that possi- 
bly affect box occurrences. Furthermore, in DLSME, it 
is possible to specify boxes where some symbols are 
"anchored" to get a fixed value. Despite the complexity 
of the allowed pattern variabilities, the system is able to 
exhibit good performances. 

In particular, for the purposes of this research, we 
looked for the pattern: 

CC{T,C,A,G} GA{T,C}-AA{A,G}-GG 

{T,C,A,G} 

using the sets of DLSME configuration parameters 
reported in Figure 1. 

Results 

Due to the numerous suggestions relating IP6K to cell 
mitochondria, we decided to first perform the IP6K 
gene search on mitochondrial DNA of plants. To date 
the full mitochondrial genome sequence is known for 42 
different vegetal organisms, belonging to various Phyla, 
even very distant from one another from the evolution- 
ary point of view. The specific IP6Ks tag (P-XXX-D-X- 
K-X-G) was searched over the overall sequenced mito- 
chondrial genomes available to date and both DNA 
strands were analyzed. Twenty three genomes out of 42 
gave at least one positive match. Interestingly, we noted 
that some tag sequences (9 amino acids) were identical 
among different organisms. For each identified tag we 
extracted a sequence of about 1200 nucleotides sur- 
rounding it. To find out possible relevant homologies, 
we performed alignments among the sequences found 
in different vegetal organism. All the sequences sharing 
the same tag showed high similarity in the region 



surrounding the consensus sequence, while alignment 
with IP6K known genes (Saccharomyces cerevisiae KCS1 
or human IP6K1) showed only a weak similarity. 
Furthermore, in order to confirm the identity of our 
putative hit, we looked for other IP6Ks conserved motifs 
in the identified putative amino acids sequence like the 
ATP binding site, the C-terminal motif (last 19 amino 
acids), and the "SSLL" motif. These analyzes led us to 
focus on the sequence PVGTDRKGG, that was found in 
mitochondrial genome of Tripsacum dactyloides, Sor- 
ghum bicolour, and three different species of Zea genus 
(Zea mays, Zea perennis and Zea luxurians). Alignment 
between the 410 aminoacid around the PVGTDRKGG 
sequence of Tripsacum dactyloides and the human IP6K 
gene showed an interesting correspondence of the con- 
sensus region (see Figure 2). 

To verify if the Tripsacum dactyloides sequence was 
an actively transcribed gene, we analyzed the Expressed 
Sequence Tags (ESTs) databases. These databases 
include short fragments of DNA derived from a longer 
cDNA sequence and representing part of the expressed 
genome. In order to confirm the expression of our 
mtDNA sequence, we screened EST databases using the 
region surrounding the PVGTDRKGG tag of Tripsacum 
dactyloides. This search failed to find any EST matching 
indicating that our putative hit is unlikely to be tran- 
scribed in mRNA. Finally, we used a region of 50 amino 
acids of Tripsacum dactyloides mtDNA surrounding the 
consensus sequence to perform a multiple alignment 
with corresponding regions of inositol phosphate kinases 
(IPMK, IP6K, IP3-3K) from different organism using 
ClustalW2. As shown in Figure 3, our sequence resulted 



Distance: 


Hamming 


Number of skips: 


0 


Number- of swaps: 


0 


\'tiinb( r of bo.rt .s: 


4 



First box h ngth : 3 
Distance from second box: 9 

First box anchors: CCT 
CCC 
CCA 
CCG 


Second box length: 3 
Distana from third box: 3 
Second box anchors: GAT 
GAC 


Third box length: 3 
Distance from fourth box: 3 

Third box anchors: AAA 
AAG 


Fourth box length: 3 
Fourth box anchors: GGT 
GGC 
GGA 
GGG 



Figure 1 DLSME configuration parameters Configuration parameters exploited for DLSME when we looked for the pattern: CC{T,C,A,G} 
GA{T,Q— AA{A,G}— GG{T,CAG} 
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ScqA Name Lon(aa) ScqB Name Len(aa) Score 



Tripsacum 410 



Human 110 



6 



CLUSTAL 2.0.12 multiple sequence alignment 
Tripsacum -PTEILSEY-K — K 



:».-, 

Human HVVQNSADAGDMRAGVQLEPFLHQVGGHMSVMKYDEHTVCKPLVSREQRFYESLPQAMKR 60 

.: :: . .:.* : : :: * * .*.: 

Tripsacum FLGTC— TRNSYVRDSSSENLPVFHSHMR— QESLWATGRHEVIHHVQT-TFRSLGTVYK 1 10 

Human FTPQYKGTVTVHLWKDSTGHLSLVANPVKESQEPFKVSTESAAVAIWQT-LQQTTGSNGS 1 19 

* *.::..*::*.:..:: **.:.:. . : ** : : *: 

Tripsacum S-NSHKWNEKWVHVNDRDLANNNVPSPYGVRDPKAIVE-TVSYSY-LT-AAPLLQGWG-S 165 

Human DCTLAQWPHAQLARSPKESPAKALLRSEPHLNTPAFSLVEDTNGNQVE — RKSFNPWGLQ 177 



:* 



: : 



• * • • 



Tr i psacu 1 1 1 ASEP YVGRVS ASYSSGRRPGKLRRD - -GETQLLPVGTDRKGG GDKLVKKQ A YCP 2 1 7 

Human CHQAHLTRLCSEYPENKRHRFLLLENVVSQYTHPCVLDLKHGTRQHGDDASEEKKARHHR 237 



* * * 



** 



Tripsacum TPKKQTKKTYAL-QQSAHPLLVASRFHPSP— FRDRRLI-YVQ-SSSDQSARTPDRLCPP 272 
Human KCAQSTSACLGVRICGMQVYQTDKKYFLCKDKYYGRKLSVEGFRQALYQFLHNGSHLRRE 297 



• * 



* ■ * 



* • * 



Tripsacum ILSRTKWNGSLILVTLCPDPSPHVRFYPPATRPTQH-GRPPPHSMLTRAGARFLGSPFPP 331 

Human LLEPILHQLRALLSVIRSQSS— YRFYSSSLLVIYD-GQEPPE RAPGSPHPH 3 Hi 

•* •* • • * »*« . »* * *** * 



Tripsacum RS-RPGWPACGSGNSPVPW-KKGWLDAGSTPRGAVRT-MISSRPLFAYR-GCLTPLRQLA 387 
Human EAPQAAHGSSPGGLTKVDIRMIDFAHTTYKGYWNEHTTYDGPDPGYIFG LENLIRIL 403 



■ • • ■ • • ^ • ^ 



• * * • • * * • • 



Tripsacum LPALSCL 
Human QDIQEGE 



110 



Figure 2 Alignment around PVGTDRKGG vs human Alignment between the 410 amino acids around the PVGTDRKGG sequence of 
Tripsacum dactyloides and the human IP6K gene (GustalW2). "*" = residues identical in the two sequences in the alignment; ":" = conserved 
substitutions; "." = semi-conserved substitutions. 



to be an outsider. This result indicated that the identi- 
fied mitochondrial tag does not belong to any subgroup 
of kinases composing the IPK gene family. 

Once excluded the presence of IP6K gene in mtDNA, 
we decided to look in nuclear genome of plants where, 
up to now, the search has been performed only by 
methods based on sequence similarity. We analyzed all 
chromosomes of Arabidopsis thaliana and Oryza sativa, 



a dicotyledonous and a monocotyledon plant respec- 
tively. In each chromosome, we found dozens of tags, 
but only few tag sequences per chromosome resulted in 
good candidates to be specific IP6K tags. In fact, too 
polar or too big amino acids between the four fixed 
positions of the tag are not consistent with the tag 
sequence functionality. In particular we considered as 
good candidate a tag sequence including amino acids L, 
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Hs - Homo sapiens 

Fly = Drosophrf.1 mctanogasfcr 

Pg = Pichti pMl iiiiwB 

Sc = Saootiaromyccs cercvteiae 

Tc ^Trypanosoma crab 

ZJ - Danio reno 



QJ 

Figure 3 Multiple alignment by ClustalW2 Phylogenetic tree from multiple alignment of a 50 amino acid region of Tripsacum dactyloides 
mtDNA surrounding the tag with corresponding regions of inositol phosphate kinase (IPMK, IP6K, IP3-3K) from different organism (Clustal W2). 
Branch lengths are proportional to the amount of inferred evolutionary change. 



V,T,M,I,A,S,G,C between the four fixed positions, and 
we rejected others. For each identified tag, we extracted 
a sequence of about 400 amino acids surrounding the 
tag. Each sequence was examined as a candidate IP6K 
gene as described above for mtDNA. We did not find 
any strong homology with known IP6Ks. This result was 
not surprising, because only a weak similarity is anyhow 
expected between organisms very distant from an evolu- 
tionary point of view. Thus, the selected sequence to be 
actually interesting was established on the basis of other 
parameters, like alignment of tag sequences, presence of 
other conserved amino acids and of sequence in EST 
database. A very promising sequence was found on 
chromosome 5 of Oryza sativa, around the tag 
PLLVDSKLG. The sequence comprises 198 amino acids 
without any stop codon. As shown in Figure 4, the Clus- 
talW alignment of this sequence and Saccharomyces 
cerevisiae KCS1 gene gave a positive score with an 
alignment in correspondence of the inositol-binding 
region. Alignment with human IP6K (hIP6K) gave a 
lower score, but still maintained the correspondence 



between tags (see Figure 5). We performed a multiple 
alignment (ClustalW2) between the region of 50 amino 
acids of Oryza sativa surrounding the tag and the corre- 
sponding regions of inositol phosphate kinases (IPMK, 
IP6K, IP3-3K) from different organisms. This analysis 
revealed (Figure 6) that the Oryza sativa sequence, 
although appear dissociated from other IPK family 
members shows a certain degree of similarity with Giar- 
dia lamblia IP6K, that itself appears to be a distant 
member of the IPK genes family. Finally, we screened 
the EST databases using the region surrounding the 
PLLVDSKLG tag of Oryza sativa. This search showed 
some matching EST, indicating that the tag sequence is 
likely to be transcribed in mRNA. 

Discussion and Conclusions 

Inositol hexaphosphate kinase (IP6K) catalyzes the con- 
version of IP6 to IP7 using ATP as phosphate donor. It 
belongs to an inositol polyphosphate kinase superfamily, 
the IPKs (Pfam PF03770), that evolved from a common 
ancestor. It is thought that a primordial IPMK may 
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SeqA Name Len(aa) SeqB Na 



Len(aa) Score 



CLUSTAL 2.0.12 multiple 

Yeast HDTSHEIHDKIPDTLREQQQHLRQKESEGCITTLKDLNVPETKKLSSVLHGRKASTYLRI 60 

Orvza s. 



Yeast 
Orvza s. 



Yeast 
( >rvza s. 



Yeast 
Oryza s. 



Yeast 
Oryza s. 



Yeast 
I Irvza . 



Yeast 
Oryza s. 



Yeast 
Oryza s. 



Yeast 
( >rvza s. 



Yeast FILSNHEEPAIKPNKHVAHRNNITETGqGSGEDIAKQQSHQPQVLHHQTSLKPIQNVDEG 180 

Oryza s. 



Yeast CISPKSTYQESLHGISEDLTLKPVSSAT' 

Oryza s. 



DIDTIQPATINC 210 



Yeast ASGIA' 
Oryza s. 



KPQQFVPHSESIKEKPNTFE :i(Kl 




NNDDIALNTEPTGTPLTHIHSFPLEHSSRQVLEKEHPEIESVHPHVKRSLSSSNQPSLLP 180 



MINTELKNLVIREVFAI 



ARLGSSPSFLTQKSRASSHDASNTS B(K1 
TTEGPRLDW 27 

«.:«.. 



Yeast HKTLGDSSSQASLQMDDSKVNPNLQDPFLKKSLHEKISNALDGSHSVMDLKQFHKNEQIK 660 

Oryza s. FCKLRD ELYLVFSLRTKIK — LDN — NLRNLNRTYSNECLK 64 

• **: **. **. :•:: :• 

Yeast HKNSFCNSLSPILTATNSRDDGEFATSPNYISNAt 
Oryza s. UES RPI QGPEDTAHYSTPTSRAPPRRP- 



SGKNHIIKSLAYNVSHDYSHHDIESITFEETSHTIVSKFILLEDLTRNMNK PCALDLKHG 

PRRRSQ PLLVDSK LG 

.:* :* *:* 

■-YYITRDKYFCRRVKVGW 
■ MARLHWLEAHLPLG- 

::*::: : :• 

QFARVLARFLYDGKTIESLIRQIPRLIKQLDTLYSEIFNLKGYRLYGASLLLMYDGDANK 

IIGGHLCIHGN 

:*.*:*: 



LIWNYLTSDKPLIFDEVEMNDf 
LIR EPLAF 



958 
190 



1018 
198 



Yeast NDPNSKLRQKWRKYELIFDAEPRYNDDAqVSD 1050 
Oryza s. 

Figure 4 Alignment around PLLVDSKLG vs yeast Alignment 
between the 198 amino acid sequence around the PLLVDSKLG tag 
of Oryza sativa and the yeast KCS1 gene (Clustal W2). "*" = residues 
identical in the two sequences in the alignment; ":" = conserved 
substitutions; "." = semi-conserved substitutions. In red the P-XXX-D- 
X-K-X-G tag. 



have been the evolutionary precursor of the IP3-3Ks 
and the IP6Ks, all of which contain the P-XXX-D-X-K- 
X-G motif [38]. Moreover, this motif represents a 
unique consensus sequence for the IPK family, with 
four key amino acids very conserved among different 
inositol phosphate kinases, despite their considerable 



sequence heterogeneity. This region modulates the cata- 
lytic site for phosphate transfer from ATP to the inosi- 
tol ring [39]. 

The inositol pyrophosphate IP7 is present in all eukar- 
yotic cells analyzed thus far, from amoeba to man; it is 
not surprising that the enzyme responsible for its synth- 
esis is highly conserved through evolution. In fact, after 
the first IP6K purification from rat brain [40], the 
enzyme was cloned in other mammalians, and its high 
evolutionary conservation was regularly observed, which 
facilitated the identification and cloning of IP6K 
enzymes from distant organisms, including yeast and 
the amoeba Dictyostelium[4\\. It is notable that Dictyos- 
telium diverted from the evolutionary main stream after 
the diversion of yeast but before the splitting between 
animals and plants [42]. Furthermore, the only IPK gene 
present in the ancient eukaryote diplomate Giardia lam- 
blia has been demonstrated to be a IP6K[43]. Thus, on 
the basis of evolutionary considerations, IP6K is 
expected to be found also in vegetal organisms. 

Moreover, pyrophosphate IP7 is present in vegetal 
organisms, and IP6-kinase enzymatic activity has been 
demonstrated in plants. However, bioinformatics analy- 
sis failed to identify any IP6 kinase in the complete Ara- 
bidopsis thaliana nuclear genome. We hypothesized that 
IP6K gene might actually occur nested in vegetal 
mtDNA, where more frequently phenomena enlarging 
protein variability do occur. Tag identification in 
mtDNA could indicate the presence of IP6K gene, even 
if not in a canonic form. Indeed, trans splicing mechan- 
isms might compact a gene consisting of more segments 
dislocated in different mtDNA regions, and editing phe- 
nomena could contribute to the failure of homology 
searches. In fact, editing mechanism might generate 
RNA molecules much different from DNA producing 
them, so that DNA sequence can be not immediately 
referable to IP6K gene in its transcript. Thus, the search 
of a gene starting from its characterizing consensus 
sequence represents a promising approach to find an 
encrypted gene. We searched for a specific IP6K tag 
within all available vegetal mtDNA sequences using 
DLSME, a very flexible system for motif discovery, 
allowing for dealing with genetic code degeneration and 
possible occurrences of editing events. Our search 
revealed several tags in mtDNA of examined plants, but 
an accurate analysis of sequences surrounding the con- 
sensus motifs led us to conclude that our hit does not 
belong to the IPK gene family. Indeed, the P-XXX-D-X- 
K-X-G consensus sequence is a characterizing motif of 
IP kinases, and it was found in all members of the 
family. Our search failed to find any sequence contain- 
ing the tag ascribable to IP6K gene and, thus, we can 
conclude that IP6K gene is not present in plant mtDNA. 
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Figure 5 Alignment around PLLVDSKLG vs human Alignment between the 198 amino acid sequence around the PLLVDSKLG tag of Oryza 
sativa and the human IP6K gene (Clustal W2). "*" = residues identical in the two sequences in the alignment; ":" = conserved substitutions; "." = 
semi-conserved substitutions. In red the P-XXX-D-X-K-X-G tag. 



Therefore, we decided to extend the search of IP6K 
gene on nuclear genome of plants. Up to now, IP6K 
gene search in plant chromosomes have been performed 
with bioinformatics methods based on sequence similar- 
ity. In our case, we looked at plant nuclear genome 
applying the new approach of gene identification by tag 
search. The advantage of this method is that it allows 
identification of a gene even if many nucleotidic 
changes have been accumulated during the evolution, so 
that the homology between homologous loci is now 
very low. In fact, it is known that IP6K is a gene highly 
conserved through mammalian evolution, but the 
homology is low when compared with organisms filo- 
genetically very distant, like Yeast. It is possible that in 
evolutionary stream bringing to plants, many nucleotidic 
changes occurred, so that plant IP6K gene looks quite 
different both from mammalian and yeast genes. By car- 
rying out our search, we found an interesting sequence 
in nuclear genome of Oryza sativa. This sequence 
shows an interesting similarity with yeast KCS1, giving 
a relatively high score when the two sequences are 
aligned using ClustalW. KCS1 gene is quite different 
from mammalian IP6K genes. It is bigger, comprising 



1052 amino acids against 410 of human IP6K, that lacks 
the first 305 KCS1 amino acids, and it has some other 
interruptions as compared to the yeast gene. Very inter- 
estingly, the homology region between Oryza sativa 
sequence and KCS1 is indeed clustered in the protein 
domain corresponding to human IP6K. This result 
might represent the strong evolution drive of the cataly- 
tic IPK domain and the likely conservation of the key 
feature of this domain in the identified Oryza sativa tag. 
Furthermore ClustalW alignment shows a correspon- 
dence between tags when we compare our sequence 
with both KCS1 and human or mouse IP6K. This corre- 
spondence is still maintained in multiple alignment 
between our sequence and KCS1, human IP6K and 
mouse IP6K. 

Multiple alignment of a 50 amino acid region of 
Oryza sativa DNA surrounding the tag with corre- 
sponding regions of IPKs from different organisms 
showed a degree of connection between Oryza sativa 
tag and Giardia lamblia IP6K sequences. Interestingly, 
among the different inositol phosphate kinases tested, 
the best match of Oryza sativa tag region was with a 
very distant IP6K. This result suggested that the 
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Figure 6 Phylogenetic tree A Phylogenetic tree from multiple alignment of a 50 amino acid region of Oryza sativa DNA surrounding the tag 
with corresponding regions of inositol phosphate kinase (IPMK, 1P6K, IP3-3K) from different organisms (ClustalW2). Branch lengths are 
proportional to the amount of inferred evolutionary change. 



sequence around the identified tag might represent a 
distant member of the IP6K subfamily of gene as the 
Giardia IP6K enzyme. 

As remarked above, the screening of EST databases 
showed some matching ESTs. Note that, although EST 
database are a very powerfull tools to study the trascrip- 
tome of a specific organism, they are often inperfect. 
Indeed, their quality is affected by transcript redun- 
dancy, low sequence quality and by high transcript trun- 
cation rates. Furthermore, these databases only 
represent the trascriptome of the tissue and develop- 
mental stages of the plant from which the mRNA was 
isolated. Thus, EST databases are not exhaustive, and a 
negative match does not exclude the expression of rare 
transcripts. This means that the ESTs we found indicate 
the chromosome region containing the putative plant 
IP6K is actively transcribed, although such ESTs do not 
possess the conserved PLLVDSKLG domain. Likely, the 



identified EST correspond to truncated isoform of the 
full length mRNA. 

In conclusion, we think that this sequence is part of 
an Oryza sativa gene homologous to mammalian IP6K. 
In particular we suppose that it is the central part of the 
gene, comprising the inositol binding site, and it lacks in 
the N-terminus and the C-terminus sequences, thus 
indicating the presence of more than one exons in the 
rice gene. The big evolutive distance between rice and 
both human and yeast could explain the low similarity 
observed among these gene. 

As future step of our research, we are planning experi- 
ments of molecular cloning and biochemical characteri- 
zation to confirm our hypothesis and to determine 
substrate specificity of the enzyme. We will use RT-PCR 
to clone Oryza sativa IP6K cDNA. The cDNA will be 
cloned into a yeast expression vector and the activity 
will be assessed through trans-complementation of 
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several yeast mutants, with particular focus on the yeast 
IP6 kinase mutant (kcslA). The recombinant enzymes 
will be tested in vitro by using either different [ 3 H] inosi- 
tol polyphosphates species or unlabelled inositol phos- 
phates with [y- 32 P]ATP to determine substrate 
specificity and calculate kinetic parameters. 

Additional material 



Additional file 1: Accession numbers of the genes referred in the 
figures. 
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